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Abstract 

This paper focuses on the consensus averaging problem on graphs under general noisy chan- 
nels. We study a particular class of distributed consensus algorithms based on damped updates, 
and using the ordinary differential equation method, we prove that the updates converge al- 
most surely to exact consensus for finite variance noise. Our analysis applies to various types 
of stochastic disturbances, including errors in parameters, transmission noise, and quantization 
noise. Under a suitable stability condition, we prove that the error is asymptotically Gaussian, 
and we show how the asymptotic covariance is specified by the graph Laplacian. For additive 
parameter noise, we show how the scaling of the asymptotic MSE is controlled by the spectral 
gap of the Laplacian. 

Keywords: Distributed averaging; sensor networks; message-passing; consensus protocols; 
gossip algorithms; stochastic approximation; graph Laplacian. 

1 Introduction 

Consensus problems, in which a group of nodes want to arrive at a common decision in a distributed 
manner, have a lengthy history, dating back to seminal work from over twenty years ago [8l [5l I18j. 
A particular type of consensus estimation is the distributed averaging problem, in which a group 
of nodes want to compute the average (or more generally, a linear function) of a set of values. Due 
to its applications in sensor and wireless networking, this distributed averaging problem has been 
the focus of substantial recent research. The distributed averaging problem can be studied either 
in continuous-time [l6], or in the discrete-time setting (e.g., [I3l [HI El EJ [9]). In both cases, there 
is now a fairly good understanding of the conditions under which various distributed averaging 
algorithms converge, as well as the rates of convergence for different graph structures. 

The bulk of early work on consensus has focused on the case of perfect communication between 
nodes. Given that noiseless communication may be an unrealistic assumption for sensor networks, 
a more recent line of work has addressed the issue of noisy communication links. With imperfect 
observations, many of the standard consensus protocols might fail to reach an agreement. Xiao et 
al. [20] observed this phenomenon, and opted to instead redefine the notion of agreement, obtaining 
a protocol that allows nodes to obtain a steady-state agreement, whereby all nodes are able to 
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track but need not obtain consensus agreement. Schizas et al. [17] study distributed algorithms 
for optimization, including the consensus averaging problem, and establish stability under noisy 
updates, in that the iterates are guaranteed to remain within a ball of the correct consensus, but 
do not necessarily achieve exact consensus. Kashyap et al. [12] study consensus updates with the 
additional constraint that the value stored at each node must be integral, and establish convergence 
to quantized consensus. Fagnani and Zampieri [10] study the case of packet-dropping channels, and 
propose various updates that are guaranteed to achieve consensus. Yildiz and Scaglione [21] suggest 
coding strategies to deal with quantization noise, but do not establish convergence. In related work, 
Aysal et al [3] used probabilistic forms of quantization to develop algorithms that achieve consensus 
in expectation, but not in an almost sure sense. 

In the current paper, we address the discrete-time average consensus problem for general 
stochastic channels. Our main contribution is to propose and analyze simple distributed proto- 
cols that are guaranteed to achieve exact consensus in an almost sure (sample-path) sense. These 
exactness guarantees are obtained using protocols with decreasing step sizes, which smooths out the 
noise factors. The framework described here is based on the classic ordinary differential equation 
method [H], and allows for the analysis of several different and important scenarios, namely: 

• Noisy storage: stored values at each node are corrupted by noise, with known covariance 
structure. 

• Noisy transmission: messages across each edge are corrupted by noise, with known covariance 
structure. 

• Bit constrained channels: dithered quantization is applied to messages prior to transmission. 

To the best of our knowledge, this is the first paper to analyze protocols that can achieve arbitrarily 
small mean-squared error (MSE) for distributed averaging with noise. By using stochastic approx- 
imation theory [HH^i we establish almost sure convergence of the updates, as well as asymptotic 
normality of the error under appropriate stability conditions. The resulting expressions for the 
asymptotic variance reveal how different graph structures — ranging from ring graphs at one ex- 
treme, to expander graphs at the other — lead to different variance scaling behaviors, as determined 
by the eigenspectrum of the graph Laplacian [7]. 

The remainder of this paper is organized as follows. We begin in Section [2] by describing the 
distributed averaging problem in detail, and defining the class of stochastic algorithms studied in 
this paper. In Section [3l we state our main results on the almost-sure convergence and asymp- 
totic normality of our protocols, and illustrate some of their consequences for particular classes 
of graphs. In particular, we illustrate the sharpness of our theoretical predictions by comparing 
them to simulation results, on various classes of graphs. Section U] is devoted to the proofs of our 
main results, and we conclude the paper with discussion in Section [5l (This work was presented 
in part at the Allerton Conference on Control, Computing and Communication in September 2007.) 

Comment on notation: Throughout this paper, we use the following standard asymptotic no- 
tation: for a functions / and g, the notation f{n) = 0{g{n)) means that /(n) < Cg{n) for some 
constant C < oo; the notation f{n) = ^}{g{n)) means that f{n) > C'g{n) for some constant C > 0, 
and f{n) = Q{g{n)) means that /(n) = 0{g{n)) and f{n) = ^}{g{n)). 
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2 Problem set-up 



In this section, we describe the distributed averaging problem, and specify the class of stochastic 
algorithms studied in this paper. 

2.1 Consensus matrices and stochastic updates 

Consider a set of m = \ V\ nodes, each representing a particular sensing and processing device. We 
model this system as an undirected graph G = (V, E), with processors associated with nodes of the 
graph, and the edge set E C V x V representing pairs of processors that can communicate directly. 
For each node v, we let N{v) : = {u & V \ {v,u) G E} be its neighborhood set. 





u 



o 





e{t) 

Figure 1. Illustration of the distributed protocol. Each node t maintains an estimate 9{t). At 
each round, for a fixed reference node r £ V, each neighbor t G N{r) sends the message JF(6'*, r)) 
along the edge t r. 



Suppose that each vertex v makes a real- valued measurement x{v), and consider the goal of 
computing the average x = ^ '^v&v ^i"^)- We assume that \x{v)\ < Xmax for all v G V, as dictated 
by physical constraints of sensing. For iterations n = 0, 1, 2, . . ., let 6^ = {9"'{v), v E V} represent 
an m-dimensional vector of estimates. Solving the distributed averaging problem amounts to having 
0" converge to 9* : = xl, where 1 G is the vector of all ones. Various algorithms for distributed 
averaging [6] are based on symmetric consensus matrices L G ]^™-x"« with the properties: 

L{v,v') / only if {v,v') £ E (la) 
LI = 0, and (lb) 
L h 0- (Ic) 



tmxm 



The simplest example of such a matrix is the graph Laplacian, defined as follows. Let A G 
be the adjacency matrix of the graph G, i.e. the symmetric matrix with entries 

A, = I' ^f^^'^')^^ (2) 
I otherwise, 
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and let D = diagjdi, (i2, . . . , dm} where di : = | A^(i)| is the degree of node i. Assuming that the 
graph is connected (so that > 1 for all i), the graph Laplacian is given by 



L{G) 



(3) 



Our analysis applies to the (rescaled) graph Laplacian, as well as to various weighted forms of graph 
Laplacian matrices [7]. 

Given a fixed choice of consensus matrix L, we consider the following family of updates, gen- 
erating the sequence {6'"', n = 0, 1, 2 . . .} of m-dimensional vectors. The updates are designed to 
respect the neighborhood structure of the graph G, in the sense that at each iteration, the estimate 
^"■^^(r) at a receiving node r G F is a function of onlj0 the estimates {9^{t), t G -/V(r)} associated 
with transmitting nodes t in the neighborhood of node r. In order to model noise and uncertainty 
in the storage and communication process, we introduce random variables £,{t,r) associated with 
the transmission link from t to r; we allow for the possibility that £,{t,r) ^ ^{r,t), since the noise 
structure might be asymmetric. 

With this set-up, we consider algorithms that generate a stochastic sequence {0",n = O,l,2,...} 
in the following manner: 



1. At time step n = 0, initialize 6^{v) = x{v) for all v £ V. 

2. For time steps n = 0, 1, 2, . . ., each node t £ V computes the random variables 



y"+i(r,t) = { 



ift = r 

.F(^"(t),r+^(t,r)) ii{t,r)eE 



(4) 







otherwise. 



where J- is the communication-noise function defining the model. 
3. Generate estimate 6*"+^ G M™ as 



+ e„ 



(L y"+^) 1 



(5) 



where denotes the Hadamard (elementwise) product between matrices, and e„ > is 
a decaying step size parameter. 



See Figure[T]for an illustration of the message-passing update of this protocol. In this paper, we 
focus on step size parameters that scale as = 0(l/n). On an elementwise basis, the update ([5]) 
takes the form 



(r) 



L{r,r)e''{r) + 



E 

teNir) 



L{r,t) T{e^{t),e^\t,r)) 



^In fact, our analysis is easily generalized to the case where 0""'"^(r) depends only on vertices t £ N'(r), where 
N'{r) is a (possibly random) subset of the full neighborhood set N{v). However, to bring our results into sharp focus, 
we restrict attention to the case N'{r) = N{r). 
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2.2 Communication and noise models 



It remains to specify the form of the the function T that controls the communication and noise 
model in the local computation step in equation (jj]). 

Noiseless real number model: The simplest model, as considered by the bulk of past work on 
distributed averaging, assumes noiseless communication of real numbers. This model is a special 
case of the update ^ with (^(t^r^ = 0, and 

j^{e^{t),c^\t,T)) = e^{t). (6) 

Additive edge-based noise model (AEN): In this model, the term ^"(t, r) is zero- mean additive 
random noise variable that is associated with the transmission t ^ r, and the communication 
function takes the form 

T{e^{t),c-'\t,r)) = e^it) + c^Ht,r). (7) 

We assume that the random variables ■^""''^(i, r) and {t' , r) are independent for distinct edges 
{t',r) and (t,r), and identically distributed with zero-mean and variance cr^ = Var(^"+^(i, r)). 

Additive node-based noise model (ANN): In this model, the function takes the same 
form ([7]) as the edge-based noise model. However, the key distinction is that for each v' £ V, we 
assume that 

C+\t,r) = C^\t) for all r G A(t), (8) 

where C"'~^^{'t) is a single noise variable associated with node t, with zero mean and variance 
0"^ = Var(^"(t)). Thus, the random variables S,"~^^{t,r) and ^"^^(t, r') are all identical for all 
edges out-going from the transmitting node t. 

Bit-constrained communication (BC): Suppose that the channel from node v' to v is bit- 
constrained, so that one can transmit at most B bits, which is then subjected to random dithering. 
Under these assumptions, the communication function takes the form 

J^{0iv'),av',v)) = QB{eiv') + Civ',v)), (9) 

where Qb{') represents the -B-bit quantization function with maximum value M and S,{v',v) is 
random dithering. We assume that the random dithering is applied prior to transmission across 
the channel out-going from vertex v' , so that C{v',v) = S,{v') is the same random variable across 
all neighbors v G N{v'). 

3 Main result and consequences 

In this section, we first state our main result, concerning the stochastic behavior of sequence {0"} 
generated by the updates ([5]). We then illustrate its consequences for the specific communication 
and noise models described in Section 12.21 and conclude with a discussion of behavior for specific 
graph structures. 
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3.1 Statement of main result 



Consider the factor LqY that drives the updates ([5]). An important element of our analysis is the 
conditional covariance of this update factor, denoted by S = S/, and given by 



E 



{L Y{e,z)) 1 F (L Y{e,z)f \ e -Le{L6f. 

A little calculation shows that the element of this matrix is given by 

m 

^eihj) = L(i,A:)L(i,£) E[y(i,A:)y(j,£)-0(fc)0(£) I 0]. 



(10) 



(11) 



Moreover, the eigenstructure of the consensus matrix L plays an important role in our analysis. 
Since it is symmetric and positive semidefinite, we can write 



L 



UJU^, 



(12) 



where U he an m x m orthogonal matrix with columns defined by unit-norm eigenvectors of L, and 
J : = diag{Ai(L), . . . , X^iL)} is a diagonal matrix of eigenvalues, with 







Ai(L) < A2(L) <... < A„(L). 



(13) 



It is convenient to let U denote the m x (m — 1) matrix with columns defined by eigenvectors 
associated with positive eigenvalues of L — that is, excluding column Ui = 1/||1||2, associated with 
the zero-eigenvalue Ai(L) = 0. With this notation, we have 



J = diag{A2(L),...,A„,(L)} = U^LU. 



(14) 



Theorem 1. Consider the random sequence {9^} generated by the update ([5]) for some communi- 
cation function T , consensus matrix L, and step size parameter e„ = G(l/n). 

(a) In all cases, the sequence {9^} is a strongly consistent estimator of 9* = xl, meaning that 
gn _^ Q* di^Qgi surely (a.s.). 

(h) Furthermore, if the second smallest eigenvalue of the consensus matrix L satisfies \2{L) > 1/2 
then 



V^{9n -9*)^n[q, 




P 



(15) 



where the (m — l) x (m — 1) matrix P is the solution of the continuous time Lyapunov equation 

T 



(16) 



where J is the diagonal matrix (I14p . and 
conditional covariance (|l(jp . 



U'^HntU is the transformed version of the 
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Theorem [Dj^a) asserts that the sequence {6^} is a strongly consistent estimator of the average. As 
opposed to weak consistency, this result guarantees that for almost any realization of the algorithm, 
the associated sample path converges to the exact consensus solution. Theorem[T]^b) establishes that 
for appropriate choices of consensus matrices, the rate of MSE convergence is of order 1/n, since 
the -y/n-rescaled error converges to a non-degenerate Gaussian limit. Such a rate is to be expected 
in the presence of sufficient noise, since the number of observations received by any given node (and 
hence the inverse variance of estimate) scales as n. The solution of the Lyapunov equation (jl6p 
specifies the precise form of this asymptotic covariance, which (as we will see) depends on the graph 
structure. 



3.2 Some consequences 

Theorem [1] can be specialized to particular noise and communication models. Here we derive some 
of its consequences for the AEN, ANN and BC models. For any model for which Theorem [U^b) 
holds, we define the average mean-squared error as 

AMSE(L;r) := — trace(P(r )), (17) 
m 

corresponding to asymptotic error variance, averaged over nodes of the graph. 

Corollary 1 (Asymptotic MSE for specific models). Given a consensus matrix L with second- 
smallest eigenvalue \2{L) > ^, the sequence {0"'} is a strongly consistent estimator of the average 
6*, with asymptotic MSE characterized as follows: 

(a) For the additive edge-based noise (AENj model ([7]).' 

max J2kj^jL^U,k)' 

=l,...,m '■' 



2 ™ 

AMSE(L;r) < — y 

i=2 



2\i{L) - 1 



(18) 



(h) For the additive node-based noise (ANNj model ([8]) and the bit- constrained (BC) model 

2 ™ 

AMSE(L;r) = — V 



(19) 



m 

1=2 



2Xi{L) - 1 



where the variance term cr^ is given by the quantization noise E [Qsid + 0^ — | ^] for the 
BC model, and the noise variance Var(^(v')) for the ANN model. 

Proof. The essential ingredient controlling the asymptotic MSE is the conditional covariance matrix 
which specifies P via the Lyapunov equation (jl6p . For analyzing model AEN, it is useful to 
establish first the following auxiliary result. For each z = l,...,m — 1, we have 

Pa < , ***^r!^ > (20) 

- 2A,+i(L)-l' ^ ' 

where |||2 = |||S|||2 is the spectral norm (maximum eigenvalue for a positive semdefinite sym- 
metric matrix). To see this fact, note that 

f/^sc/ ^ u^[\im2i]u = 111S1112/. 
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Since P satisfies the Lyapunov equation, we have 



^/^ I'^ 



Note that the diagonal entries of the matrix ^ J — |^ P+P ^ J — |^ are of the form (2Aj+i — 1) Pa. 
The difference between the RHS and LHS matrices constitute a positive semidefinite matrix, which 
must have a non-negative diagonal, implying the claimed inequality ()20p . 

In order to use the bound ([20|) . it remains to compute or upper bound the spectral norm |||S|||2, 
which is most easily done using the elementwise representation (jlip . 

(a) For the AEN model dH), we have 

E[Y{i,k)Y{j,^)-^^e,\ e\ = E[ai,k)c{j,i)] . (21) 

Since we have assumed that the random variables k) on each edge (i, k) are i.i.d., with zero-mean 
and variance a^, we have 



E[Y{i,k)Y{j,£)-e{k)e{£) 



0-2 if {i, k) = (j, £) and i ^ j 
otherwise. 



Consequently, from the elementwise expression (lllh . we conclude that S is diagonal, with entries 

SO that |||S|||2 = cr^ ™&Xi=i,.--,m Sfc^^j -^jfci which establishes the claim ([T8]). 

(b) For the BC model ([9]), we have 

r , , , , I if i = 7 and k = £ , , 

E[Y{i,k)Y{j,£)-e{k)e{£) \ 6] = \ ^ (22) 

10 otherwise, 

where dq^^- : = E + 0^ - 6*2 I 6*] is the quantization noise. Therefore, we have E(0*) = 

cTqj^^L^, and using the fact that U consists of eigenvectors of L (and hence also L^, the Lyapunov 
equation (jl6p takes the form 

~ ~ (T'^ A? (L) 

which has the explicit diagonal solution P with entries Pa = 2X'+i(l)-i • Computing the asymptotic 

MSB yields the claim The proof of the same claim for the ANN model is 

analogous. □ 
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3.3 Scaling behavior for specific graph classes 

We can obtain further insight by considering Corohary [1] for specific graphs, and particular choices 
of consensus matrices L. For a fixed graph G, consider the graph Laplacian L{G) defined in 
equation ([3]). It is easy to see that L{G) is always positive semi-definite, with minimal eigenvalue 
\i{L[G)) = 0, corresponding to the constant vector. For a connected graph, the second smallest 
eigenvector L[G) is strictly positive [?]• Therefore, given an undirected graph G that is connected, 
the most straightforward manner in which to obtain a consensus matrix L satisfying the conditions 
of Corollary [T] is to rescale the graph Laplacian L{G), as defined in equation ([3]), by its second 
smallest eigenvalue \2{L(G)), thereby forming the rescaled consensus matrix 

with A2(i?(G)) = 1 > ^. 

With this choice of consensus matrix, let us consider the implications of Corollary [IJb), in 
application to the additive node-based noise (ANN) model, for various graphs. We begin with a 
simple lemma, proved in Appendix [Aj showing that, up to constants, the scaling behavior of the 
asymptotic MSE is controlled by the second smallest eigenvalue A2(-^^(G)). 

Lemma 1. For any connected graph G, using the rescaled Laplacian consensus matrix (]23p . the 
asymptotic MSE for the ANN model ([8]) satisfies the hounds 

v2 ^2 



a 



2A,(L(0)) ^ ™SE(fl(G);r) < ^j^, (24) 

where \2{L[G)) is the second smallest eigenvalue of the graph. 

Combined with known results from spectral graph theory [7], Lemma [Hallows us to make specific 
predictions about the number of iterations required, for a given graph topology of a given size m, 
to reduce the asymptotic MSE to any 5 > 0: in particular, the required number of iterations scales 
as 

Note that this scaling is similar but different from the scaling of noiseless updates [H [9], where the 

i°g(i/^) 

-log(l-A2(L(G)))- 



MSE is (with high probability) upper bounded by (5 for n = Q ( _ ingn - A^ff (c))) ) ' '^hich scales as 



„ , logd/S) \ 



for a decaying spectral gap X2{L{G)) 0. 



3.4 Illustrative simulations 

We illustrate the predicted scaling (|25|) by some simulations on different classes of graphs. For all 
experiments reported here, we set the step size parameter e„ = :f;;^riQQ- The additive offset serves 
to ensure stability of the updates in very early rounds, due to the possibly large gain specified by 
the rescaled Laplacian (123p . We performed experiments for a range of graph sizes, for the additive 
node noise (ANN) model ([8]), with noise variance = 0.1 in all cases. For each graph size m, we 
measured the number of iterations n required to reach a fixed level 6 of mean-squared error. 
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3.4.1 Cycle graph 

Consider the ring graph Cm on m vertices, as ihustrated in Figure[2]^a). Panel (b) provides a log-log 
plot of the MSE versus the iteration number n; each trace corresponds to a particular sample path. 
Notice how the MSE over each sample coverges to zero. Moreover, since Theorem [T] predicts that 
the MSE should drop off as 1/n, the linear rate shown in this log-log plot is consistent. Figure [2]^c) 
plots the number of iterations (vertical axis) required to achieve a given constant MSE versus the 
size of the ring graph (horizontal axis). For the ring graph, it can be shown (see Chung [7|) that 
the second smallest eigenvalue scales as X2{L{Cm)) = @{l/m?), which implies that the number of 
iterations to achieve a fixed MSE for a ring graph with m vertices should scale as n = 0(m^). 
Consistent with this prediction, the plot in Figure ^c) shows a quadratic scaling; in particular, 
note the excellent agreement between the theoretical prediction and the data. 




steps ^'^'^ 

(a) (b) (c) 

Figure 2. Comparison of empirical simulations to theoretical predictions for the ring graph in panel 
(a), (b) Sample path plots of log MSE versus log iteration number: as predicted by the theory, the 
log MSE scales linearly with log iterations, (c) Plot of number of iterations (vertical axis) required to 
reach a fixed level of MSE versus the graph size (horizontal axis) . For the ring graph, this quantity 
scales quadratically in the graph size, consistent with Corollary [TJ 



3.4.2 Lattice model 

Figure[3]^a) shows the two-dimensional four nearest-neighbor lattice graph with m vertices, denoted 
Fm- Again, panel (b) corresponds to a log-log plot of the MSE versus the iteration number n, with 
each trace corresponding to a particular sample path, again showing a linear rate of convergence 
to zero. Panel (c) shows the number of iterations required to achieve a constant MSE as a function 
of the graph size. For the lattice, it is known [7] that \2{L{Fm)) = ©(l/m), which implies that the 
critical number of iterations should scale as n = G(m). Note that panel (c) shows linear scaling, 
again consistent with the theory. 

3.4.3 Expander graphs 

Consider a bipartite graph G = (Fl, V2,-E'), with m = |Vi| -|- IV2I vertices and edges joining only 
vertices in Vi to those in ¥2-, and constant degree d; see Figure H)^ a) for an illustration with d = 2>. 
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Sieps f^oties 

(a) (b) (c) 

Figure 3. Comparison of empirical simulations to theoretical predictions for the four nearest- 
neighbor lattice (panel (a)), (b) Sample path plots of log MSE versus log iteration number: as 
predicted by the theory, the log MSE scales linearly with log iterations, (c) Plot of number of 
iterations (vertical axis) required to reach a fixed level of MSE versus the graph size (horizontal axis). 
For the lattice, graph, this quantity scales linearly in the graph size, consistent with Corollary [TJ 



A bipartite graph of this form is an expander [HElEj with parameters a, 5 G (0) 1)) if foi^ subsets 
S* C Vi of size |5| < a|Vi|, the neighborhood set of S — namely, the subset 

N{S) := {teV2\{s,t) for some sG 5}, 

has cardinality |A^(5')| > (^dlS*!. Intuitively, this property guarantees that each subset of Vi, up 
to some critical size, "expands" to a relatively large number of neighbors in V2. (Note that the 
maximum size of |A'"(S')| is d\S\, so that 5 close to 1 guarantees that the neighborhood size is close to 
its maximum, for all possible subsets S.) Expander graphs have a number of interesting theoretical 
properties, including the property that \2[L{Krn)) = ©(1) — that is, a bounded spectral gap [HE]. 

In order to investigate the behavior of our algorithm for expanders, we construct a random bi- 
partite graph as follows: for an even number of nodes m, we split them into two subsets Vi^i = 1,2, 
each of size m/2. We then fix a degree d, construct a random matching on nodes, and use it 
connect the vertices in Vi to those in V2. This procedure forms a random bipartite d-regular graph; 
using the probabilistic method, it can be shown to be an edge-expander with with probability 
1 — 0(1), as the graph size tends to infinity [H 111] . 

Given the constant spectral gap \2{L{Km)) = ©(1)) the scaling in number of iterations to 
achieve constant MSE is n = ©(1). This theoretical prediction is compared to simulation results in 
Figure m note how the number of iterations soon settles down to a constant, as predicted by the 
theory. 

4 Proof of Theorem [1] 

We now turn to the proof of Theorem [H The basic idea is to relate the behavior of the stochastic 
recursion ([5]) to an ordinary differential equation (ODE), and then use the ODE method [15] to 
analyze its properties. The ODE involves a function t ^ 9t ^ M™, with its specific structure 
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(a) (b) (c) 

Figure 4. Comparison of empirical simulations to theoretical predictions for the bipartite expander 
graph in panel (a), (b) Sample path plots of log MSE versus log iteration number: as predicted by 
the theory, the log MSE scales linearly with log iterations, (c) Plot of number of iterations (vertical 
axis) required to reach a fixed level of MSE versus the graph size (horizontal axis). For an expander, 
this quantity remains essentially constant with the graph size, consistent with Corollary [TJ 



depending on the communication and noise model under consideration. For the AEN and ANN 
models, the relevant ODE is given by 



dt 

For the BC model, the approximating ODE is given by 



(27) 



_i = _L CM{et) with Cm{u) : = { 



u if |n| < M 
-M iiu<-M 
+M iiu>+M. 



(28) 



In both cases, the ODE must satisfy the initial condition ^o(^^) = x{v). 



4.1 Proof of Theorem [T](a) 

The following result connects the discrete-time stochastic process {^"} to the deterministic ODE 
solution, and estabhshes Theorem [1]^ a): 

Lemma 2. The ODEs (j27p and ()28p each have 6* = xl as their unique stable fixed point. Moreover, 
for all 5 > 0, we have 

P Aim sup \\9n-0tJ>6] = 0, for t„ = Y.k=i h (29) 

\ n— ♦oo / 

which implies that 0" — > 6* almost surely. 

Proof. We prove this lemma by using the ODE method and stochastic approximation — in partic- 
ular, Theorem 1 from Kushner and Yin [14j . which connects stochastic recursions of the form ([5]) 
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to the ordinary differential equation dOt/dt = E^[n (L ^)) | 9t\- Using the definition of Y 
in terms of J- , for the AEN and ANN models, we have 

^^[T{e{v),ay,r)) I e{v)] = e{v), 

from which we conclude that with the stepsize choice em = 0(1/?ti), we have 

^i:[n {LQY{et,0) I Ot] = -L6t. 

By our assumptions on the eigenstructure of L, the system dOt/dt = —L Ot is globally asymptotically 
stable, with a line of fixed points {6 G | = 0}. Given the initial condition 9q{v) = x{v), we 
conclude that 6* = xl is the unique asymptotically fixed point of the ODE, so that the claim (j29p 
follows from Kushner and Yin |14] . 

For the BC model, the analysis is somewhat more involved, since the quantization function 
saturates the output at itM. For the dithered quantization model ([9]), we have 

E^[n{LQY{9t,C))\ Ot] = -LCM{0t), 

where Cm{') is the saturation function (j28p . We now claim that 9* is also the unique asymptotically 
stable fixed point of the ODE d9t/dt = —LCMiGt) subject to the initial condition 9o{v) = x{v). 
Consider the eigendecomposition L = UJU'^, where J = diag{0, A2(i), • • • , Am(-^^)}- Define the 
rotated variable 7t : = U'^9t, so that the ODE ([28]) can be re-written as 

d-{t{l)/dt = (30a) 
d^t{k)/dt = -Xk{L)UlCM{Ujt), for fc = 2,...,m, (30b) 

where denotes the A;*'* column of U. 

Note that Ui = l/||l||2, since it is associated with the eigenvalue Ai(L) = 0. Consequently, the 
solution to equation (|30ap takes the form 

7^(1) = U{9o = V^x, (31) 

with unique fixed point 7*(1) = ^/nix, where x := ^ x{i) is the average value, 

A fixed point 7* G M."^ for equations ()30bp requires that U'^Cm(U^*) = 0, for k = 2, . . . ,m. 
Given that the columns of U form an orthogonal basis, this implies that Cm(C^7*) = ctl for some 
constant a G M, or equivalently (given the connection C/7* =9*) 

Cm (9*) = al. (32) 

Given the piecewise linear nature of the saturation function, this equality implies either that the 
fixed point satisfies the elementwise inequality 9* > M (li a = M); or the elementwise inequality 
9* < -M (if a = -M); or as the final option, the 9* = a when a G (-M, +M). But from 
equation ([31]), we know that 7*(1) = y/mx G [-M ^/m, +My/m]. But we also have 7*(1) = 9* 
by definition, so that putting together the pieces yields 

19* 

-M < < M, (33) 

m 

Thus the only possibility is that 9* = al for some constant a G {—M,+M), and the relation 
C/7* = al implies that a = ^*{l)/^/m = x, which establishes the claim. □ 
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4.2 Proof of Theorem [T](b) 

We analyze the update ([5]) using results from Benveniste et al [1]. In particular, given the stochastic 
iteration 6'"+i = 6l"+e„i?(6'", define the expectation h{e) = E [H{e,X)], its Jacobian matrix 

V/i(6>), and the covariance matrix T,{e) = E [{H{e,X) - h{e)){H{e,X) - h{e)f]. Then Theorem 
3 (p. 110) of Benveniste et al [1] asserts that as long as the eigenvalues A(V/i(0)) are strictly below 
-1/2. then 

V^{9n-e*)SN{0,Q), (34) 
where the covariance matrix Q is the unique solution to the Lyapunov equation 

^ + Vh{e*)]Q + Q(^ + Vh{9*)] +Se*=0. (35) 



We begin by computing the conditional distribution h{9); for the models AEN and ANN it 
takes the form 

h{e) = -19 (36) 

since the conditional expectation of the random matrix Y is given by E[y | 9] = 9 1. For the BC 
model, since the quantization is finite with maximum value M, the expectation is given by 

h{9) = -L Cm{9) (37) 



where the saturation function was defined previously (j28p . In addition, we computed form of the 
the covariance matrix S^* previously (fTOl) . Finally, we note that 

\/h{9*) = -L (38) 

for all three models. (This fact is immediate for models AEN and ANN; for the BC model, note 
that Theorem [T]^a) guarantees that 9* falls in the middle linear portion of the saturation function.) 

We cannot immediately conclude that asymptotic normality (I34p holds, because the matrix L 
has a zero eigenvalue (Ai(L) = 0). However, let us decompose L = UJU'^ where U is the matrix 
with unit norm columns as eigenvectors, and J = diag{0, X2{L), . . . , Xm{L)}. Let U denote the 
m X (m — 1) matrix obtained by deleting the first column of U. Defining the (m — 1) vector 
P"- = U'^9"', we can rewrite the update in (m — l)-dimensional space as 



pn+l ^ pn^lljjT(^^Q y"+l(0")) rj , (39) 



n L 



for which the new effective h function is given by h(f3) = — J/3, with J = diag{A2(-Z^), • • • , XmiL)}- 
Since A2(-C') > ^ by assumption, the asymptotic normality (fMI) applies to this reduced iteration, so 
that we can conclude that 



/ri (/?"-/?*) A N{0,P) 
where P solves the Lyapunov equation 

J-i^P + p(j-i^ = U^^nM. 
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We conclude by noting that the asymptotic covariance of O"" is related to that of /3" by the relation 




P 



u, 



(40) 



from which Theorem [T]^b) follows. 



5 Discussion 

This paper analyzed the convergence and asymptotic behavior of distributed averaging algorithms 
on graphs with general noise models. Using suitably damped updates, we showed that it is possible 
to obtain exact consensus, as opposed to approximate or near consensus, even in the presence of 
noise. We guaranteed almost sure convergence of our algorithms under fairly general conditions, 
and moreover, under suitable stability conditions, we showed that the error is asymptotically nor- 
mal, with a covariance matrix that can be predicted from the structure of the consensus operator. 
We provided a number of simulations that illustrate the sharpness of these theoretical predictions. 
Although the current paper has focused exclusively on the averaging problem, the methods of anal- 
ysis in this paper are applicable to other types of distributed inference problems, such as computing 
quantiles or order statistics, as well as computing various types of M-estimators. Obtaining analo- 
gous results for more general problems of distributed statistical inference is an interesting direction 
for future research. 
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A Proof of Lemma [T] 

We begin by noting that for the normalized graph Laplacian L{G), it is known that for any graph, 
the second smallest eigenvalue satisfies the upper bound X2{L{G)) < m/[m — 1) < 1. Moreover, 
we have trace(L(G)) = m. See Lemma 1.7 in Chung [7] for proofs of these claims. 

Using these facts, we establish Lemma [T] as follows. Recall that by construction, we have 
R{G) = xTi^TT' ^° ^^^^ the second smallest eigenvalue of R{G) is \2{R{G)) = 1, and the remaining 
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eigenvalues are greater than or equal to one. Applying Corollary [T] to the ANN model, we have 



AMSE(L;i 



9 m 

- E 



1=2 



m\2{L{G)) ^ 



[HR{G))f 

2\{R{G)) - 1 

m 

E 



i=2 



ML{G))f 



2\{L{G)) - \2{L{G)) 



> 



2\2{L{G)) m 
2A2(L(G)) 



trace(L(G)) 



In the other direction, using the fact that \2{R{G)) > 1 and the bound 2x-i 



using the fact that trace(L(G)) = m. 
I: 

have 

AMSE(L; 9* 



< X for X > 1. we 



9 m 

- E 



i=2 



2 1 



2\{R{G)) - 1 



a 



< — trace(i?(G)) 
m 

^2 

■trace(L(G)) 



a 



\2{L{G))m 

,2 
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